Goto

Collaborating Authors

 decentralized td tracking


Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Neural Information Processing Systems

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.


Review for NeurIPS paper: Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Neural Information Processing Systems

Weaknesses: The main weakness of this paper is that a very similar problem was considered in reference [31], which has a nearly identical title. The main difference between this paper and [31] seems to be in the method: this paper uses gradient tracking, while [31] does not. Nevertheless, both papers show convergence to a neighborhood of the optimal solution, so the theoretical innovation in this paper is not sufficient for NeurIPS publication. The authors mention that the empirical results of this paper are better, but this paper seems focused on theory with a relatively brief simulation section, so it should be evaluated as a theory paper. Additionally, the results here fall short of what one wants to achieve in this setting, which is convergence to the optimal solution.


Review for NeurIPS paper: Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Neural Information Processing Systems

This paper is theoretical work that provides finite time analysis for decentralised TD learning. The reviewers and myself, although not anonymously, think this contribution may be significant and interesting to the community due to recent interest in the finite time analysis of TD algorithms and (linear) function approximation. We request the authors to address the changes required in the manuscript. The authors propose a distributed method for safety. The reviewers and myself were not convinced that this paper proposes a novel method, specifically, due to lack of proper comparison to previous work.


Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Neural Information Processing Systems

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.